AITopics | aditya ramesh

Collaborating Authors

aditya ramesh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Kandinsky 3.0 Technical Report

Arkhipkin, Vladimir, Filatov, Andrei, Vasilev, Viacheslav, Maltseva, Anastasia, Azizov, Said, Pavlov, Igor, Agafonova, Julia, Kuznetsov, Andrey, Dimitrov, Denis

arXiv.org Artificial IntelligenceDec-11-2023

We present Kandinsky 3.0, a large-scale text-to-image generation model based on latent diffusion, continuing the series of text-to-image Kandinsky models and reflecting our progress to achieve higher quality and realism of image generation. Compared to previous versions of Kandinsky 2.x, Kandinsky 3.0 leverages a two times larger U-Net backbone, a ten times larger text encoder and removes diffusion mapping. We describe the architecture of the model, the data collection procedure, the training technique, and the production system of user interaction. We focus on the key components that, as we have identified as a result of a large number of experiments, had the most significant impact on improving the quality of our model compared to the others. By our side-by-side comparisons, Kandinsky becomes better in text understanding and works better on specific domains. Project page: https://ai-forever.github.io/Kandinsky-3

architecture, international conference, kandinsky 3, (16 more...)

arXiv.org Artificial Intelligence

2312.03511

Country:

North America > United States > Maryland > Baltimore (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot Classification via Stable Diffusion

Shipard, Jordan, Wiliem, Arnold, Thanh, Kien Nguyen, Xiang, Wei, Fookes, Clinton

arXiv.org Artificial IntelligenceApr-16-2023

In this work, we investigate the problem of Model-Agnostic Zero-Shot Classification (MA-ZSC), which refers to training non-specific classification architectures (downstream models) to classify real images without using any real images during training. Recent research has demonstrated that generating synthetic training images using diffusion models provides a potential solution to address MA-ZSC. However, the performance of this approach currently falls short of that achieved by large-scale vision-language models. One possible explanation is a potential significant domain gap between synthetic and real images. Our work offers a fresh perspective on the problem by providing initial insights that MA-ZSC performance can be improved by improving the diversity of images in the generated dataset. We propose a set of modifications to the text-to-image generation process using a pre-trained diffusion model to enhance diversity, which we refer to as our $\textbf{bag of tricks}$. Our approach shows notable improvements in various classification architectures, with results comparable to state-of-the-art models such as CLIP. To validate our approach, we conduct experiments on CIFAR10, CIFAR100, and EuroSAT, which is particularly difficult for zero-shot classification due to its satellite image domain. We evaluate our approach with five classification architectures, including ResNet and ViT. Our findings provide initial insights into the problem of MA-ZSC using diffusion models. All code will be available on GitHub.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.03298

Country:

Asia > Middle East > Jordan (0.05)
Oceania > Australia > Queensland (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

Add feedback

Investigating Prompt Engineering in Diffusion Models

Witteveen, Sam, Andrews, Martin

arXiv.org Artificial IntelligenceNov-21-2022

With the spread of the use of Text2Img diffusion models such as DALL-E 2, Imagen, Mid Journey and Stable Diffusion, one challenge that artists face is selecting the right prompts to achieve the desired artistic output. We present techniques for measuring the effect that specific words and phrases in prompts have, and (in the Appendix) present guidance on the selection of prompts to produce desired effects.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.15462

Genre: Research Report (0.43)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.55)

Add feedback

Language Does More Than Describe: On The Lack Of Figurative Speech in Text-To-Image Models

Kleinlein, Ricardo, Luna-Jiménez, Cristina, Fernández-Martínez, Fernando

arXiv.org Artificial IntelligenceOct-19-2022

The impressive capacity shown by recent text-to-image diffusion models to generate high-quality pictures from textual input prompts has leveraged the debate about the very definition of art. Nonetheless, these models have been trained using text data collected from content-based labelling protocols that focus on describing the items and actions in an image, but neglect any subjective appraisal. Consequently, these automatic systems need rigorous descriptions of the elements and the pictorial style of the image to be generated, otherwise failing to deliver. As potential indicators of the actual artistic capabilities of current generative models we characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models. Considering the sharp difference observed between their language style and that typically employed in artistic contexts, we suggest generative models should incorporate additional sources of subjective information in their training in order to overcome (or at least to alleviate) some of their current limitations, thus effectively unleashing a truly artistic and creative generation.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.10578

Country:

Europe > Spain > Galicia > Madrid (0.05)
North America > United States > California (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

How Does DALL·E-2 Work?

#artificialintelligenceAug-7-2022, 07:55:18 GMT

DALL·E-2 is a new AI system that can create realistic images and art from a description in natural language. Recently OpenAI just releases the beta version of DALL·E-2. In this article, we will take a close look at the original research paper of DALL·E-2 and understand how exactly it works. DALL·E-2 originates from this paper: Hierarchical Text-Conditional Image Generation with CLIP Latents [1]. DALL·E-2 is based on the unCLIP model proposed in this paper.

clip image, clip text, decoder, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

OpenAI's new image generator sparks both excitement and fear

#artificialintelligenceApr-10-2022, 15:05:35 GMT

OpenAI has unveiled a new AI tool that turns text into images -- and the results are stunning. Named DALL-E 2, the system is the successor to a model unveiled last year. While its predecessor generated some impressive outputs, the new version is a major upgrade. DALL-E-2 adds enhanced textual comprehension, faster image generation, and four times greater resolution. "When approaching DALL-E 2 we focused on improving the image resolution quality and improving latency, rather than building a bigger system," OpenAI researcher Aditya Ramesh told TNW.

excitement and fear, new image generator spark, openai, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

DALL·E: Creating Images from Text

#artificialintelligenceFeb-3-2021, 05:11:48 GMT

DALL·E[1] is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs. We've found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images. GPT-3 showed that language can be used to instruct a large neural network to perform a variety of text generation tasks. Image GPT showed that the same type of neural network can also be used to generate images with high fidelity. We extend these findings to show that manipulating visual concepts through language is now within reach.

aditya ramesh, caption, unrelated concept, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback